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The applicability of the Rasch model to data from a 
typical personality test, the High School Personality Questionnaire 
(HSPQ), was studied. The data were gathered on Junior High and High 
School students in the LouirETville Public Schools (Kentucky). Item 
easinesses and person abilities were estimated and compared by age 
group, within each age group, and across two points in time for the 
older age group. In addition, certain results from the Rasch analyses 
were compared with those of factor analysis. A sample of 1,000 
students was taken from each of the groups (Junior High, 7th graders; 
Senior High, 11th and 12th graders). Results of the study are related 
to five questions considered. The first question was whether or not 
there were patterns of frit to the Rasch model when responses are 
dichotomized in differcint ways. The results indicated that' no single 
key was superior to others in producing fit. The second question was 
concerned v/ith fit of the model for the data considered; it was found 
that frequently there was lack of fit, but it is noted that the test 
statistic, was conservative. The third question related to the 
stability)" of item easinesses estimates within a group and across two 
^j^ints in time for that group. The conclusion was that different item 
easinesses are obtained when different degrees of possession of the 
trait are focused upon. The fourth question was how stable the tests 
of fits results are across time; pre- and post- comparisons of fit 
foiind 55X in agreement. The fdfth question concerned how the item 
mean squares are related to factor loadings; in almost all cases, the 
item v;ith the highest mean square was also the item with the lowest 
loading* (author/DB) 
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Introduction 

The Rasch measurement E.odel is a mathematical statement of the 
odds of a correct response when a person of certain ability encounters 
an item of given easinesses. The attractive property of this model 
is that item easinesses and person abilities can be estimated inde- 
pendently. Thus, the term "sample-free" has been applied to the 
model. The model was conceived (Rs.sc:h, 1960) for application to 
atility data. Thus most treatments of the model speak of binary 
(correct vs. incorrect) responses to items. However, there has been 
little application of the model to personality test data which, typ- 
ically, is less reliable than ability test results. 

The purpose of this paper is to study the applicability of the 
Rasch model to data from a typical personality test, the High School 
Personality Questionaire (HSPQ) . Tb.e data were gathered on Junior 
High and High School sttidents in the Louisville Public Schools. 
Item easinesses and person abilities (these terms are defined below) 
were estimated and .'compared: by age group (Junior High students and 
Senior High student*;; within each age group (cross-validation); and 
across two points in time (beginning vs. end of the 1970-71 school 
year) for the older age group in order to look at the stability of 
the efjtimates. In addition, certain results from the Rasch analyses 
were compared with those of factor analysis. 

Before discussing specific procedures of this paper, hovrev/er, 
it will be useful to review some important aspects of the Rasch 
model . 

The responfies to ability test items are usually scored as correct 
C+) or incorrect C-)* Personality tests, however, typically have item 
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alternates v;hich are weighted In varying degrees, auch that choice 
of an alternate with a larger ^/eight results in a larger score on a 

. particular dimension for the person. Thus, there is no "correct" 
'I alternate, but alternates of different strengths with respecf. to the 

personality trait measured. One solution to the problem of non-dlchot- 
omous scores on a personality test is to force a binary scoring routine. 
This procedure was used in several forms In this paper. The results 
are described below. Tliough a formulation of the model which permits 
polychotomous responses would be the optimal analytic tool here. It 
has not been fully developed to date. The results presented here may 
be of some Interest to persons working on the polychotomous form of 

' tliii model. 

Basic : Assumptions of the Moda l 

Scoring the responses as + or - for person v on Item i of a uni- 
factor test allows us to represent a correct response as a = 4# 
The probabicllty of a correct response, then, is stated as p{+|v^ i}. 

The first assumption of the Rasch measurement model is given as 

p { + I V, i } = X^. /(I + X^.)i X^^. > 0. / . 

Here X^^ is the odds of success on item i for person v . 

The second assumption of the model is that 

. Ki' ^\)Hi 1 0 and >^ 0 
where is the ability of person v and Is the easiness of item i. 
This equation states that the odds of person v responding correctly to 
item i are the product of the person ability (5^) and item easiness (e^) 
components. 

Let a^^ = 0 if person v responds Incorrectly to Item i and a^^ = 1 
if person v responds correctly to item i. Then, given the above assump- 
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tlons, the probability of a correct response for person v and item i 
is given as 

The third assumption of the model is that all answers, given the 
parameters, are stochttfitically independent. 

Estimates of the Parameters of the Model 

There are different procedures (see Rasch, 1960; Bramble, 1970; 
Wright and Panchapakesan, 1968) for obtaining estimates and stavivlard 
errors of person abilities and item easinesses. 

The maximum likelihood estionation procedureis (Wright and Pancha- 
kesan, 1968) involve obtaining iteratively a solution to the Implicit 
equations 

/c-l , 

^-H " ? ^H* + cti*)fO. + exp (ibv* + cf^*))), 

i = 1,2 ... k 

, k , 

J « ? {exp (b^* + d^*)/(l + exp (b^* + d^*))), 

J = 1, 2 ... - 1 
where " number of persons who get item i correct (item score) 

0 the score, an ability estimate is obtained for each score 

I* number of persons iu score group j,. 
fcj** « ability estimate 
d^* = easiness estimate 

The Newton-Raphson procedure is used to solve for the unknown parameter 
estimates. - 

An approximation to the standard variance of item estimates is 
giveoi as: 
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VW^*) = 1/ Z (y^ exp Q>.^'* 4tf-*)/a + exp (Jb^* + d-*))2). 

^ approximation to the standard variance of ability estimates is 
given as: 

V* (fc*> = 1/(CQ)*) exp 0)*)) + (1/^2(2,*)) 

E (V(d^)(exp (d£)/Cl + exp (cL + 2?*))2)2) 
i ■ 

where 

fff^*) = Z (exp (d^)/a + exp (b* +d^))2). 



The fit of an item to the model may be investigated by forming 
standard deviates of the score group X items matrix, 

where . 
and 

The mean square for a particular item may then be the criterion by 
which one can determine fit (small mean square) or lack of fit (large 
mean square) . 

The overall test of fit is made using a chi-square statistic, 
a conservative test. It is obtained by summing all squared unit normal 
deviates, 

2 r r-1 

1 1 ^^-J 

with df = (score firoups - 1) (items - 1). A likelihood ratio statistic 



may also 1)e used as a test for fit. 

We now turn to the basic questioner posed by this paper. They 
may be siimmarlzed as follows: 

1. Are there patterns of fit to the Rasch model when responses 
vr.e dichotomized In different ways? 

For the data considered in this paper, the question is: 
should all responses which indicate a positive amoxmt of 
the trait, only responses suggesting an extreme possession 
of the trait, or responses indicating a moderate level on 
the trait be scored as "correct"? 
- 2. How well does the Rasch model fit for the data considered? 

Is there generally a greater degree of fit for one age group? 
Does a particular scoring key result in greater fit? 
3. How stable are the estimates, of item easinesses both within 
a group and across two points in time' for that group? 
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Method 

The Sample . «'• 

In an experimental project Involving public schools in llie Louis- 
ville Public Schools, the HSPQ was administered to approximately 6,000 
students, selected from grades: ona through twelve. The preitest data 
were obtained during September, 1970 and*'pc»stest data were obtained 
during May, 1971. For this study, the data were obtained from two 
general groups: Junior high students (seventh gradev:s) r:.r.d Senior 
High students (eleventh and twelfth graders) . A sample of 1000 stu- 
dents was taken from each group. So that cross-validations could be 
performed within each group, these samples were arbitrarily split into 
two samples of 500 each. In the paper, we shall refer to the first and 
second samples of the Junior High group and the first and second sam- 
ples of the Senior High group though "first" and "second" are merely 
labels. 

Instrumentation 

The personality test used in the study was the IPAT HSPQ. This 
instrument has 14 subtests, each of which measures a different trait. 
One subtest CB), having one correct answer for an item, is an ability 
test which, lt*s authors report, measures "crystallized" as opposed to 
"fluid" general ability. The former (crystallized) kind of ability 

is thought to change over time and broadly refers to one^s^gradual 

' ^ ^ I 

acquisition of information (e.g., vocabulary). The latter type of 
ability (fluid) refers to abilities thought to be more stable, such 
as those required to see analogies or deduce a certain conclusion 
froTD Bivtiu information. There is occasion, at a later point, to 



describe two other subtests. Though the remaining eleven subteiis 
are not described, their titles are reported in Table 1. These uoi;les 
are designed to measure various other aspects of personality. Imill^ii 
subtest B, these scales contain items* having alternatives repriiseintir^g 
three levels of response rather than having alternatives which a; >^ 
merely "right" and '"wrong". * Thus the items are t:richotoiriOus but the 
alternatives are assumed to be ordered in terms of strcmgtlu 

Each subtest is comprised of ten items, each "item havin</ ihcee 
alternates. The HSPQ scoring system is such that the weights of each 
alternate in its contribution to the total subtest score is either 
zero, one, or two. The weights for the single alternates per items 
that a person chooses, then, are summed to give his score. For a 
ten-item subtest, therefore, the highest possible score is 20. 

Because of the necessity for diqhotomous data, the 0, 1, or 2 
ref^ponses were transformed to zero-one data. This was accomplished 
using the following keys: key-1 accepted only the alternates weighted 
one Oniddle alternatives) as "correct"; key-2 accepted only the alter- 
nates weighted two (extreine alternatives) as "correct" j, and key-1,2 
accepted ef.ther the first or the second kind of alternate (any posi- 
tive response = 1). To clarify, the responses weighted two represent 
alternates that characterize more strongly the trait measured; res- 
ponses weighted one characterize vith less strength the trait measured. 
For conparison, each of these keys was used in scoring each of the 
14 sub-tests. 
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' Resclts and Discussion 

Table 1 shows probabnitles of fit, first for the x2 test and 
second for the likelihood ratio test for all samples and subtests. 
Before looking at particular subtests, some general trends should 
be noted. Though measures of skewness were not obtained, the gen- 
eral tendency observed was for key-1 to produce a positively skewed 
distribution of scores, key-2 to produce a distribution less positively 
skewed (or approximately normal), and for key-1, 2 to produce a nega- 
tively skewed distribution. Recalling' that a scosre group refers to 
all persons who obtained a certain score and that there are nine 
possible score groups for each subtest here (score groups zero and • 
ten are discarded because they do not contribute to the analysis), 
it was observed that key-1 analysis sometlmas resulted in as many as 
three empty score groups. The empty groups for this key were always 
j at the upper end (i.e., either nine, eight and nine, or seven, eight, 

I . and nine) . To be more specific about skewness, 80 to 90 per cent of 

r 

I the responses typically were contained in either score groups one 

I through five or five through nine. Key-2 generally produced better 

i 

\ ^° the model, though there are variations on this conclusion for 

[ individual subtests. Notice also that subtest B, the ability test, 

j produced consistent lack of ff?*'statistics. These distributions 

j were generally negatively skewed Ci.e., the items were easy), 

j In order to obtain information relevant to questions one and two 

j Cregarding results with different keys), the total number of subtest,.. 

I fits are shown in the margins of Table 2. The right margin shows total 

fits by key for each subtest. Analyses for subtests D and Q3 generally 
resulted in better fit for all keys, . Analyses of subtests H and I re- 
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I moderate agreement from one sample to another, for either subtests 

1 D or !• There are exceptions. For example, It is seen that there is 

one case in which agreement from one sample to the other is quite high. 
This is in the case of subtest X, key-2 for the Junior High group; only 
the items positioned four and five are Inverted,. 

The other case where there is coriaiderable agreement of item orders 
from the First Sample to the Second Sample is for key-1,2 in subtest D 
of the Senior High Sample. Different items change rank in this case, 
however. The statistic for fit for these two analyses was ,008 and 
•211. , Interestingly, Subtest I resulted generally in more/corfSE^^nt 
item orders, by key, than did Subtest D, the better fitting subtest. 

Subtest B for the Senior High Samples had only one inversion, 
items nine and seven. There was less agreement of orders for a par- 
ticular key for the Junior High Samples. Agreement across samples 
within the Senior High group, however, was somewhat better. Noee that 
items five and seven through ten did remain generally on the difficult 
end of the groups. The consistency, by key, from the Junior High to 
the Senior High Sample, for subtest D and I, was generally no better 
thrm that f«^!ileved within either group. And, rio particular key gives 
better results than the others. 

Nov let us look at the easiness log estimates themselves, com- 
prising the second columns of Table 2. Only easiness estimates for 
selected analyses will be discussed here. The reader may investigate 
for himself the remaining cases. Observe the easiness estimates for 
key-2, first and second samples of the Junior High group. Notice that 
Items four and five inverted In rank from the first to the second 
samples. Note also, however, that the easiness estimates for these 
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items were similar for the first sample, becoming more alike in the 
second sample. Remaining estimates are quite agreeable. 

In the other case for which there^ims only one inversion of 
rank (subtest D, key-1,2. Senior High group), the same tendency 
regarding similarity of easiness estimates existed. That isj, items 
nine and aix were fairly close in easiness on the first sample and 
also on the second, the difference being that their positions were 
reversed in the second sample* In other words', for the two examples 
related here, one might explain the rever^jals in easiness order by 
reference to the closeness of easinesses, or that their standard 
-errors will include the other easiness estimate. 

Finally, with respect to Table 2, Kuder-Richardson reliebility 
coefficients CKR-20) obtained for each subtest, by key, are uhown. 
For the poorly- fit ting Subtest I, key-2 obtains higher coefficients 
with just one exception. Of the KR-20's for subtests D and I, .535 
was the highest, with probability of fit be:tng less than .001. The 
KR~20's tor subtest B are not particularly higher, though they do 
tend to exceed the others slightly. 

We now report analyses regarding question four^ related to the 
Rasch analysis over time. First the pos test data analjfges are pre- 
sented and discussed. Then, pretest results are compared with thene, 
in terms of fit statistics and item easiness orders. 

We shall use the postest data for only the Senior High group. 
This group contained the samples which tended to produce more con- 
sistent orders of easiness estimates. 

Table 3 shows probabilities for the tests of fit. The format 
is the same as for Table 1. It can be seen that no particular key 
results in much better fit than another, results which are similar 



13 



( 

to those df the pretest data. When we return to Table 1 and tally 
the niimber of fitting subtests by key, for just the Senior High group, 
the results are 12, 18, and 18 for the respective keys. Here, however, 
the first key produced a slightly larger number of non-significant 
tests. Again, as with pretest data analyses reported in Table 1, sub- 
test D obtains in overall better fit and Subtest B results show consis- 
tent lack of fit, as does Subtest I. 

To shed light on the question regarding consistency of fit to the 
model from one point in time to another. Table 4 was constructed. Non- 
fitting subtests are indicated by "L" and fitting subtests simply by 
"F", from pre to post in each case. No single key gives more consistent 
results. Subtest gives the most consistent results across all keys 
from pre- to postests for either sample. Subtests D, 0, and Q2 show 
only moderate consistency, especially when key-1,2 is used for the 
latter two tests. 

The second part of question four related to the consistency of 
Item easiness orders over two points in time. Subtests B, D, and 1 
again were selected so that postast results can be compared with those 
already reported. Table 5 contains the relevant information, in the 
same format as Table 2. First, we note characteristics within the 
postest data, then we compare easiness estimates from pre- to post- 
data. - 

First, let us look at the columns which contain the ranks of the 
easiness estimates. The rank agreement for the different keys within 
each sample is usually lacking or Inverse. However, agreement across 
samples by key is high in three cases. One case Involves subtest B, 
in which the positions of items four and two and items nine and seven 
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are Ixxverted from the first to the isr^cond sample » The other two cases 
relate to key-192 of subtests P and Ip UhUa this key produced moderate 
correspondence of Item orders for subtest high agreement is seen 
across samples for the Dp subtest* These results are much like those 
reported In Table 2p which contains pretest analyses. 

The Item easiness rank correspondence from pretest to postest is 
somewhat higher. This is especially true for the second sample where 
lcey-2 and key-ly2 result in high correspondence for subtests D and X, 
For the single key used with subtest By it is seen that item easiness 
rankings are rather consistent across all Senior High samples, with 
Items two and four and seven and nine having some tendency toward in- 
version. The item orders of the Junior High samples agree more with 
themselves than with either pretest or postest samples within the 
Senior High group. At the same time. It is recalled that easiness 
orders are more stable within the latter group than the former. 

The KR^20*s obtained in the ppstest analysis of subtest D are 
consistently higher by about ,10 for subtest P, For subtest I, key-1 
of the first sample produced a sizeable increase (#266 to #436), others- 
wise no dlscernable pattern. 

The final analysis is related to question five. Here, we want 
to investigate the relationship between mean squares obtained for 
each item using the Basch model and loadings of each item resulting 
from a factor analysis, 

A requirement of the Rasch model is that the test have a factor 
structure that is unitary. One aspect of the factor analytic method 
referred to as "unrestricted maximum-likelihood factor analysis" 
CJoreskog, 1967) is that one may test an hypothesis that. the data csn 
be accounted for by a certain number of factors. In this analysis , 
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thercforei this factor analytic technique will be used to test the 
hypothesis (a > .01) that a unif actor solution can he obtained for 
each null of the selected subtests. We shall observe in some detail 
the results for subtest B. We will then look at the overall outcome 
of the analyses of subtests D and I, then compare mean squares and 
loadir/gs in a separate table. 

Some cautions regarding the maximum likelihood method 'should be 
mentioned. The method should be used when the distribution of the 
variables is multivariate normal. For variables composed of responses 
to a single item, this is not a valid assumption. Choosing; between 
this procedure or classical procedures, however, we decided to use 
the former in factoring the correlation matrix. 

Table 6 contains, for each subtest, the factor loadings for 
each Item for a single factor solution for each key. The prob- 
ability for a two-factor solution is also given, though loadirags 
on the two factors are not listed. 

Look at the results in Table 6 for subtest B. We have seen that 
the Rasch model consistently does not fit for these data, though item 
rankings and easiness estimates across the High School samples are 
frequently more consistent than for other, better fitting subtests. 
This table shows that the first sample did produce a unif actor test, 
but that the second sample did not. Two items, five and eight, had 
near zero loadings on both analyses. (Recall that if was items seven 
and nine which were Inverting orders from one samplv:: to another in 
the Rasch analysis.) If we now return to the Rasch output and check 
the mean square of the items for these samples, it is seen that while 
most items have mean squares at acceptable levels (e*g,, slightly 
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above l.O), the statistics for these Itezns are quite large, being 
nine r^nd twenty three, respectively, for the first sample and seven- 
teen twenty for the sc^cond sample. Checking the percentages of 
correct responses to these Items shows they are most difficult, 
falling roughly at ,35 and .22 for both sampleis. The authors re- 
checked the subtest key constructed by IP AT and found no errors. 
A short description of these two items may be useful. 

Item five is an analogy item, paraphrasing: part is to half as 
parent is to ' . The alternates are (correct one is under- 
lined) "grandfather," "father," and "son." Item eight requires a 
deduction, again paraphrasing: given five coins with three of them 
bent aiid four of them silver, how many silver coins must be bent? 
Alternate responses are: one, two, or three. 

There is no reason to suspect that the content of these items 
differs greatly from the other it(!ms of subtest B. However, it is 
likely that these Items are too d:Lfficult for this group. 

The results for subtest B tond to show more unlfactor solutions 
than the results for s;jibtest I. No clear, overall trend in the 
statistics, by key, is apparent. 

Table 7 shows, for each subteat^ the items (first column) ranked 
according to mean square and the saiLe Get of items (second column) 
ranked by factor loading. Mean squares are ranked from low (top) to 
high Cbottom) but loadings from high (top) to low (bottom). Thus, 
the first row contains the item (1) with the lowest mean square and 
the item (3) with the highest lohding. 

Another question of: concern In this paper is related to a com- 
parison of items based on the mean square fits obtained in the Rasch 
procedure and the loadings resulting from the Factor analyses. Table 
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7 §©»t8tn.8 it^s ordered §ceprdjng tp mean ggyares §nd ipadings for 
befeh Senior High samples, for thfi single key to |hfi case pf subtest B 
SB4 fPV three keys in the case pf subtests D and I, 

%t §§A be seen in Table 7 that to 11 pf the 14 pairs pf analyses 
th§ item yitly the largest^mean square corresppnds to the item with the 
gmlt^Bt ifyedlng, From this extreme, the nimbpr pf items that corres- 
pond tend tp decrease and become further apart in iankr sc» that for 
m§t ©f the items there tends to be np particular pattern apparent. 
Hp psFticyiar key results in an prder any mpre consistent across sam- 
plis than another, Nor dp different keys within a sample give similar 
©p^prings, Although it may not be apparent at first, closer tospection 
§{ fpbie 7 will show that subtest D tends to produce prders of mean 
gpsres end Ipadings that are closer tpgether, 

The rigults in Table 7 suggests rather strongly that items with 
liPge peen squares tend also to be the items that have small loadings 
en the factor, gut this trend is not discernable as the items become 
elpser in pean square value and in siise of factor loading. 
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jSumzrary 

Though the Rasch model xraa concel9e^ for application to ability 
tests 9 the authors undertook to apply the procedure to a personality 
test, the HSPQ. Though a polychotomous model would have been more 
appropriate for these kinds of data, the unavailability of such a 
model required the authors to consider different scoring procedures 
In order to make use of the model for dichotomous responses. 

There were five questions considered. The first related to 
whether or not there were patterns of fit to the Rasch model when 
responses are dichotomized in different ways. The results indicated 
that no single key was superior to others in producing fit. 

The second question was concerned with fit of the model for the 
data considered. Frequently, there was lack of fit, though it should 
be noted that the test statistic was a conservative one. For the 
pretest data roughly 66% of the analyses resulted in fit. For the 
postest data, approximately 54% of the analyses produced fit to the 
model. 

The third question related to the stability of item easiness 
estimates within a group and across two points in time for that group. 
The concluGion was rather clear tliat different item easinesses are 
obtained when different degrees of possession of the trait are focused 
upon. For the same key, however, easiness estimates are somewhat con- 
slstent^^^oreso with older children. 

Fourth, how stable are the results from the tests of fits across 
time? In the pre- to post-comparisons of fit, 55% were in agreement. 
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Finally, how are the itera mean sqi^ares related to factor loadings? 
In almost all cases the Item with the highest mean square was also 
the item with the lowest loading. As Items hecame more similar In 
values on mean squares and on loadings, no relationship was apparent. 
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